Skip to content

Conversation

ldematte
Copy link
Contributor

@ldematte ldematte commented Oct 6, 2025

In order to better understand the performance characteristics of vector indexing with a GPU, this PR introduces 2 changes:

  • changes to KnnIndexTester (more logging, support different write buffer sizes in input, support async-profiler
  • more logging in the GPU codec

@ldematte ldematte added >non-issue auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search v9.2.1 v9.3.0 labels Oct 6, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 6, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

if (System.getenv("DO_PROFILING") != null) {
jvmArgs '-XX:StartFlightRecording=dumponexit=true,maxsize=250M,filename=knn.jfr,settings=profile.jfc'
}
def asyncProfilerPath = System.getProperty("asyncProfiler.path", null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncertain about this part about async profiler, but I trust your expertise on this.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldematte Thanks Lorenzo, makes sense. I've left some comments to address, but otherwise it looks good.

Comment on lines +62 to +74
if (asyncProfilerPath != null) {
if (OS.current().equals(OS.MAC)) {
def asyncProfilerAgent = "${asyncProfilerPath}/lib/libasyncProfiler.dylib"
println "Using async-profiler agent ${asyncProfilerAgent}"
jvmArgs "-agentpath:${asyncProfilerAgent}=start,event=cpu,interval=10ms,file=${layout.buildDirectory.asFile.get()}/tmp/elasticsearch-0_%t_%p.jfr"
} else if (OS.current().equals(OS.LINUX)) {
def asyncProfilerAgent = "${asyncProfilerPath}/lib/libasyncProfiler.so"
println "Using async-profiler agent ${asyncProfilerAgent}"
jvmArgs "-agentpath:${asyncProfilerAgent}=start,event=cpu,interval=10ms,wall=50ms,file=${layout.buildDirectory.asFile.get()}/tmp/elasticsearch-0_%t_%p.jfr"
} else {
println "Ignoring 'asyncProfiler.path': not available on ${OS.current()}";
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am cool with this. However, why don't we add wall to MAC as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it and had to back off. Looking at the error I got and at the async-profiler code, apparently only Linux has an implementation that uses perf events, which let you record both cpu time and wall time at the same time. On Mac, the engine behind is less flexible/precise, and you can have one or the other.

I'm wondering: maybe I should add an option of that, like adding a -DasyncProfiler.event=, default to cpu, but can be changed to wall so Mac users can have this choice?

Comment on lines +110 to +111
iwc.setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH);
iwc.setRAMBufferSizeMB(writerBufferSizeInMb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful, we should by default benchmark with ES defaults. Optimizing our benchmarks but not our production code can give a false sense of improvement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setRAMBufferSizeMB is the same default as before, but now I'm not sure it is the same as ES. I'll check.
As for the number of docs... I'll check that too. I supposed these settings are exclusive, but it might be they are "first met wins" instead.
If that's the case, I'll default to what ES has and add an input option for that too.

} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
logger.debug("Index thread times: [{}ms] read, [{}ms] add doc", readTime / 1_000_000, docAddTime / 1_000_000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to worry about this nuance, we should simply adjust this to return the time each thread spent indexing and sum those up in the results

Additionally, we would need to separate the call to:

ConcurrentMergeScheduler cms = (ConcurrentMergeScheduler) iwc.getMergeScheduler();
cms.sync();

and maybe have a "exact index time" vs "overall index time" or something.

Copy link
Contributor Author

@ldematte ldematte Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because I was seeing weird variance on the machine I'm using (on AWS) and wanted to see if read/write performance was at the root of the issue.
I'm not sure there is value in keeping this, other than looking at the logs and say "ah, yes, this run has issues with reading, I'll treat it as an outlier" (or not, depending on what you are measuring).
But I do agree that a plain log is not the best tool for this; I'll make the change you suggest, it makes sense and it's not too much of a change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.1 v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants